Reviewer Integration and Performance Measurement for Malware Detection
نویسندگان
چکیده
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system’s ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.
منابع مشابه
DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers
To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...
متن کاملMalware Detection using Classification of Variable-Length Sequences
In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. Acco...
متن کاملPractical Experiences with Purenet, a Self-Learning Malware Prevention System
This paper introduces Purenet, which is a self-learning malware detection system aimed at avoiding zero-day attacks and other delays in patching application systems when attacks are identified. The concept and architecture of Purenet are described, specifically positioning anomaly detection as the system enabler. Deployment of the system in an operational environment is discussed, and associate...
متن کاملPerformance measurement of detection and governmental punitive agency by integrated approach based on DEMATEL, ANP, and DEA
The detection and governmental punitive agency is responsible for supervision on correct implementation of the business laws in Iran. Several criteria are involved in performance assessment of detection and governmental punitive agency. The interaction between criteria and sub-criteria may occur in real situations. Moreover, the performance measurement should be accomplished in a multi-period h...
متن کاملProbabilistic and Considerate Attestation of IoT Devices against Roving Malware
Remote Attestation (RA) is a popular means of detecting malware presence (or verifying its absence) on embedded and IoT devices. It is especially relevant to low-end devices that are incapable of protecting themselves against infection. Malware that is aware of ongoing or impending attestation and aims to avoid detection can relocate itself during computation of the attestation measurement. In ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016